[ML] Support revoking inference default endpoint authorization #121326

jonathan-buttner · 2025-01-30T19:29:49Z

This PR implements the ability to revoke authorization for default endpoints. When a node boots up, it makes a call the EIS gateway. If the gateway returns a list of model ids that does not include one of the default endpoint models, we will revoke authorization. To revoke authorization, we instruct the ModelRegistry to remove the default endpoints that were absent from the authorization response from its in memory cache. The ModelRegistry then performs a delete-by-query to remove the inference endpoint ids.

I also added a check in the deletion code path to prevent users from deleting default endpoint ids.

…e-auth-revoke

jonathan-buttner · 2025-01-30T20:25:31Z

...in/java/org/elasticsearch/xpack/inference/action/TransportDeleteInferenceEndpointAction.java

        ClusterState state,
        ActionListener<DeleteInferenceEndpointAction.Response> masterListener
    ) {
+        if (modelRegistry.containsDefaultConfigId(request.getInferenceEndpointId())) {


This change will prevent REST calls from deleting default inference endpoints.

jonathan-buttner · 2025-01-30T20:26:02Z

...plugin/inference/src/main/java/org/elasticsearch/xpack/inference/registry/ModelRegistry.java

        }));
    }

+    // TODO should we add a lock on the default model id so we can't attempt to delete it while we're adding it?


Should we add the default model ids to the lock map while they're being persisted?

I'm fine with leaving it as is? I'm thinking the worst case scenario is:

Thread 1 starts persist

Thread 2 starts delete

Thread 2 finishes delete

Thread 1 finishes persist

But the next call will just redo the delete? And we won't get stuck in some sort of a delete -> put loop? Maybe? The only risk that I see is that the storeDefaultEndpoint is potentially called as part of _xpack/usage, so it is running frequently in some environments.

Yeah now that I think about it more I'm not even sure that's possible at least at the moment. The only way we would add a new default endpoint that has a possibility of being deleted would be when a node boots up for EIS. A single node wouldn't try to add the EIS default endpoints and delete them.

Two separate nodes could run into a situation though: Where node A gets the authorization response and begins persisting the default endpoints. Node B boots up but the authorization request fails for some reason, it would then attempt to delete the default endpoints.

It looks like we're not aborting conflicts though so I'd expect one of those calls to succeed and then when another node restarts it would remove the default endpoint eventually.

jonathan-buttner · 2025-01-30T20:27:29Z

...plugin/inference/src/main/java/org/elasticsearch/xpack/inference/registry/ModelRegistry.java

        DeleteByQueryRequest request = new DeleteByQueryRequest().setAbortOnVersionConflict(false);
        request.indices(InferenceIndex.INDEX_PATTERN, InferenceSecretsIndex.INDEX_PATTERN);
-        request.setQuery(documentIdQuery(inferenceEntityId));
+        request.setQuery(documentIdsQuery(inferenceEntityIds));


Interesting side not, delete by query will not fail if it can't find an id. Which is helpful here because for the revoking of authorization situation, we don't really care if the model exists already or not, just that it is removed if it did exist.

jonathan-buttner · 2025-01-30T20:28:20Z

...rc/main/java/org/elasticsearch/xpack/inference/services/elastic/ElasticInferenceService.java

+            authorizationCompletedLatch.countDown();
+        });
+
+        getServiceComponents().threadPool()


This is being called from the onResponse of an action listener. Which is probably already on a utility thread but just in case we'll kick off a new one.

jonathan-buttner · 2025-01-30T20:28:41Z

...rc/main/java/org/elasticsearch/xpack/inference/services/elastic/ElasticInferenceService.java

+     * @param waitTime the max time to wait
+     * @throws IllegalStateException if the wait time is exceeded or the call receives an {@link InterruptedException}
+     */
+    public void waitForAuthorizationToComplete(TimeValue waitTime) {


Making this public so I can access it in the integration tests that were created.

jonathan-buttner · 2025-01-30T20:32:48Z

@elasticmachine merge upstream

elasticmachine · 2025-01-30T20:32:51Z

There are no new commits on the base branch.

elasticsearchmachine · 2025-01-31T15:08:24Z

Pinging @elastic/ml-core (Team:ML)

…e-auth-revoke

prwhelan · 2025-02-04T19:08:38Z

.../internalClusterTest/java/org/elasticsearch/xpack/inference/integration/ModelRegistryIT.java

+            @SuppressWarnings("unchecked")
+            var listener = (ActionListener<List<Model>>) invocation.getArguments()[0];


There's another API that does the implicit typecasting magic, if you prefer:

ActionListener<List<Model>> listener = invocation.getArgument(0);

prwhelan · 2025-02-04T19:13:18Z

...plugin/inference/src/main/java/org/elasticsearch/xpack/inference/registry/ModelRegistry.java

        }));
    }

+    // TODO should we add a lock on the default model id so we can't attempt to delete it while we're adding it?


I'm fine with leaving it as is? I'm thinking the worst case scenario is:

Thread 1 starts persist

Thread 2 starts delete

Thread 2 finishes delete

Thread 1 finishes persist

But the next call will just redo the delete? And we won't get stuck in some sort of a delete -> put loop? Maybe? The only risk that I see is that the storeDefaultEndpoint is potentially called as part of _xpack/usage, so it is running frequently in some environments.

…ic#121326) * Starting revoke * Adding integration tests * More integration tests * Adding test for deleting default inference endpoint via rest call * Removing task type any * Addressing feedback and adding test

elasticsearchmachine · 2025-02-06T13:29:56Z

💚 Backport successful

Status	Branch	Result
✅	9.0
✅	8.18
✅	8.x

…ic#121326) * Starting revoke * Adding integration tests * More integration tests * Adding test for deleting default inference endpoint via rest call * Removing task type any * Addressing feedback and adding test

…121326) (#121906) * [ML] Support revoking inference default endpoint authorization (#121326) * Starting revoke * Adding integration tests * More integration tests * Adding test for deleting default inference endpoint via rest call * Removing task type any * Addressing feedback and adding test * Fixing tests

…#121326) (#121907) * [ML] Support revoking inference default endpoint authorization (#121326) * Starting revoke * Adding integration tests * More integration tests * Adding test for deleting default inference endpoint via rest call * Removing task type any * Addressing feedback and adding test * Fixing tests

…121326) (#121908) * [ML] Support revoking inference default endpoint authorization (#121326) * Starting revoke * Adding integration tests * More integration tests * Adding test for deleting default inference endpoint via rest call * Removing task type any * Addressing feedback and adding test * Fixing tests

jonathan-buttner added 3 commits January 29, 2025 17:02

Starting revoke

25828f5

Adding integration tests

7b760b8

More integration tests

b9e20b6

jonathan-buttner added >non-issue :ml Machine learning Team:ML Meta label for the ML team auto-backport Automatically create backport pull requests when merged Feature:GenAI Features around GenAI v9.0.0 v8.18.0 v8.19.0 labels Jan 30, 2025

elasticsearchmachine added the v9.1.0 label Jan 30, 2025

jonathan-buttner added 2 commits January 30, 2025 15:24

Adding test for deleting default inference endpoint via rest call

0a04210

Merge branch 'main' of github.com:elastic/elasticsearch into ml-handl…

93fbeeb

…e-auth-revoke

jonathan-buttner commented Jan 30, 2025

View reviewed changes

jonathan-buttner added 2 commits January 30, 2025 16:43

Merge branch 'main' into ml-handle-auth-revoke

a61cf62

Merge branch 'main' into ml-handle-auth-revoke

464a370

jonathan-buttner marked this pull request as ready for review January 31, 2025 15:08

jonathan-buttner requested a review from davidkyle January 31, 2025 15:08

jonathan-buttner added 2 commits February 4, 2025 12:52

Merge branch 'main' of github.com:elastic/elasticsearch into ml-handl…

e9a9e2c

…e-auth-revoke

Removing task type any

7e452a3

prwhelan approved these changes Feb 4, 2025

View reviewed changes

Addressing feedback and adding test

c5f8ef6

jonathan-buttner added v8.18.1 v9.0.1 labels Feb 5, 2025

jonathan-buttner merged commit 671ecd0 into elastic:main Feb 6, 2025
17 checks passed

jonathan-buttner deleted the ml-handle-auth-revoke branch February 6, 2025 13:28

jonathan-buttner mentioned this pull request Feb 6, 2025

[9.0] [ML] Support revoking inference default endpoint authorization (#121326) #121906

Merged

This was referenced Feb 6, 2025

[8.18] [ML] Support revoking inference default endpoint authorization (#121326) #121907

Merged

[8.x] [ML] Support revoking inference default endpoint authorization (#121326) #121908

Merged

		@SuppressWarnings("unchecked")
		var listener = (ActionListener<List<Model>>) invocation.getArguments()[0];

[ML] Support revoking inference default endpoint authorization #121326

[ML] Support revoking inference default endpoint authorization #121326

Uh oh!

Conversation

jonathan-buttner commented Jan 30, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jonathan-buttner commented Jan 30, 2025

Uh oh!

elasticmachine commented Jan 30, 2025

Uh oh!

elasticsearchmachine commented Jan 31, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

elasticsearchmachine commented Feb 6, 2025

💚 Backport successful

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants